12
3 Tonality and Chroma
3.1 The equal-tempered scale
We now investigate the time-frequency structure related to the concept of tonality and chroma. The
tonality is concerned with the distribution of notes at a given time.
In contrast, the melody characterizes the variations of the spectrum as a function of time.
We first observe that the auditory sensation that is closely related to frequency, pitch, corresponds to a
logarithmic scale of the physical frequency. We have seen examples of such scales in the previous lab:
the bark and the mel scales.
In this lab, we define another logarithmic scale, known as the tempered scale. First, we define a
reference frequency f
0
associated with a reference pitch. While it is customary to use the frequency 440
Hz associated with the pitch A4, we will use f
0
= 27.5Hz (this known as A0, the lowest note on a piano).
The tempered scale introduces 12 frequency intervals—known as semi-tones—between this reference
frequency f
0
and the next frequency also perceived as an A, 2f
0
. As a result, a frequency f
p
that is p
semitones away from f
0
is given by
fp = f02
p/12
(12)
An interval of 12 semitones, [f
p
,f
p+12
), corresponds to an octave. The same notes (e.g. A, or C#) are
always separated by an octave. For instance, the two previous A notes, A0 and A4, are separated by 4
octaves, since 440 = 2
4
× 27.5. The notion of note can be formally modeled by the concept of chroma.
3.2 Chroma
The chroma is associated with the relative position of a note inside an octave. It is a relative
measurement that is independent of the absolute pitch. We describe an algorithm to compute a chroma
feature vector for a frame n. To simplify the notation, we drop the dependency on the frame index, n, in
this discussion.
We first present the idea informally, and then we make our statement precise. We are interested in
mapping a given frequency, f, to a note. Given a reference frequency f
0
, we can use the following
equation to compute the distance p between f and f
0
, in semitones
p = round(12log
2
(f/f
0
)) (13)
We note that f is usually not exactly equal to f
0
2
p/12
for an integer value of p, which is why we round
12log
2
(f/f
0
) to the closest integer.
We can then map this index p to a note, or “chroma”, c, within an octave, using
c = p mod(12) (14)
or equivalently,